Bacterial DNA Sequence Compression Models Using Artificial Neural Networks

نویسندگان

  • Manuel J. Duarte
  • Armando J. Pinho
چکیده

It is widely accepted that the advances in DNA sequencing techniques have contributed to an unprecedented growth of genomic data. This fact has increased the interest in DNA compression, not only from the information theory and biology points of view, but also from a practical perspective, since such sequences require storage resources. Several compression methods exist, and particularly, those using finite-context models (FCMs) have received increasing attention, as they have been proven to effectively compress DNA sequences with low bits-per-base, as well as low encoding/decoding time-per-base. However, the amount of run-time memory required to store high-order finite-context models may become impractical, since a context-order as low as 16 requires a maximum of 17.2 × 10 memory entries. This paper presents a method to reduce such a memory requirement by using a novel application of artificial neural networks (ANN) to build such probabilistic models in a compact way and shows how to use them to estimate the probabilities. Such a system was implemented, and its performance compared against state-of-the art compressors, such as XM-DNA (expert model) and FCM-Mx (mixture of finite-context models) , as well as with general-purpose compressors. Using a combination of order-10 FCM and ANN, similar encoding results to those of FCM, up to order-16, are obtained using only 17 megabytes of memory, whereas the latter, even employing hash-tables, uses several hundreds of megabytes. Entropy 2013, 15 3436

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Pareto Optimization of Two-element Wing Models with Morphing Flap Using Computational Fluid Dynamics, Grouped Method of Data handling Artificial Neural Networks and Genetic Algorithms

A multi-objective optimization (MOO) of two-element wing models with morphing flap by using computational fluid dynamics (CFD) techniques, artificial neural networks (ANN), and non-dominated sorting genetic algorithms (NSGA II), is performed in this paper. At first, the domain is solved numerically in various two-element wing models with morphing flap using CFD techniques and lift (L) and drag ...

متن کامل

DNA Sequence Classi cation Using Compression-Based Induction

Inductive learning methods, such as neural networks and decision trees, have become a popular approach to developing DNA sequence identi cation tools. Such methods attempt to form models of a collection of training data that can be used to predict future data accurately. The common approach to using such methods on DNA sequence identi cation problems forms models that depend on the absolute loc...

متن کامل

Monthly runoff forecasting by means of artificial neural networks (ANNs)

Over the last decade or so, artificial neural networks (ANNs) have become one of the most promising tools formodelling hydrological processes such as rainfall runoff processes. However, the employment of a single model doesnot seem to be an appropriate approach for modelling such a complex, nonlinear, and discontinuous process thatvaries in space and time. For this reason, this study aims at de...

متن کامل

Artificial Neural Networks (ANN) for the simultaneous spectrophotometric determination of fluoxetine and sertraline in pharmaceutical formulations and biological fluid

Simultaneous spectrophotometric estimation of Fluoxetine and Sertraline in tablets were performed using UV–Vis spectroscopic and Artificial Neural Networks (ANN). Absorption spectra of two components were recorded in 200–300 (nm) wavelengths region with an interval of 1 nm. The calibration models were thoroughly evaluated at several concentration levels using the spectra of synthetic binary mix...

متن کامل

PREDICTION OF COMPRESSIVE STRENGTH AND DURABILITY OF HIGH PERFORMANCE CONCRETE BY ARTIFICIAL NEURAL NETWORKS

Neural networks have recently been widely used to model some of the human activities in many areas of civil engineering applications. In the present paper, artificial neural networks (ANN) for predicting compressive strength of cubes and durability of concrete containing metakaolin with fly ash and silica fume with fly ash are developed at the age of 3, 7, 28, 56 and 90 days. For building these...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:
  • Entropy

دوره 15  شماره 

صفحات  -

تاریخ انتشار 2013